amnesia-ab CONFIRMED + Ollama backend adapter + v0.1.1 prep by OpenCircuitDev · Pull Request #67 · OpenCircuitDev/opencircuitmodel

OpenCircuitDev · 2026-06-11T20:45:41Z

What's in here (5 commits on top of main)

Dropbox→Git migration snapshot (e4613de, pre-existing on this branch)
bench/isolation/memory/amnesia-ab — first memory sandbox to RUN, verdict CONFIRMED (87c19c8)
- memory-ON fact recall 94.2% (confirm ≥70) · retrieval hit rate 100% (confirm ≥80) · memory-OFF sanity 2.5% (must be ≤25)
- llama3 8B Q4 + mxbai-embed-large, 62-memory corpus w/ cross-project distractors, 20 tasks, objective key-fact scoring
- OFF-arm failure mode is confident fabrication, not ignorance — the memory loop is the product, demonstrated
Ollama backend adapter in ocm-inference (ad2162a) — native NDJSON /api/chat, health via /api/tags, max_tokens→num_predict; parser tests pinned to verbatim live-daemon captures. Selector untouched (daemon settings wiring is the follow-up).
README correction (1121e21) — registry is 3/3 SHA256-verified since chore: drop unhashed Qwen3 entries; spec blockers cleared #50; the '5 GGUFs / open hash blocker' claims were stale.
v0.1.1 draft release notes (5c60cb5) — docs/release-notes/v0.1.1-draft.md.

Verification

amnesia-ab: full prompt/output/score rows in bench/isolation/memory/amnesia-ab/results/run-2026-06-11T20-32-21.json
Rust changes: CI matrix (fmt + clippy + test × ubuntu/macos/windows) on this branch — run 27376070667

🤖 Generated with Claude Code

The cheapest discriminating test of the central loop (spec row 9, library- driven retrieval) as a faithful miniature: mxbai-embed-large cosine top-5 -> inject -> llama3 8B Q4 via Ollama, 62-memory corpus w/ cross-project distractors, 20 tasks, objective key-fact scoring. Measured (results/run-2026-06-11T20-32-21.json): memory_on_fact_recall_pct 94.2 (confirm >=70) retrieval_hit_rate_pct 100.0 (confirm >=80) memory_off_fact_recall_pct 2.5 (sanity <=25 — corpus not guessable) latency p50 on/off 19.5s / 12.3s OFF-arm failure mode is confident fabrication, not ignorance — the memory loop is the difference between correct specifics and plausible lies on an 8B model. Per the decision rule this justifies: the Ollama backend adapter, activating mem0-v3-locomo, and cutting v0.1.0. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… Ollama users Third InferenceBackend: bridges OCM to an existing Ollama daemon via its native NDJSON /api/chat API (model tag required per-request; max_tokens maps to options.num_predict; health via /api/tags). Selector untouched — explicit construction for now; daemon settings wiring is the follow-up. Parser test fixtures are VERBATIM captures from a live Ollama daemon (llama3, 2026-06-11) — pinned to the real wire format. Motivated by the amnesia-ab sandbox CONFIRMED verdict (94.2% fact recall on this exact daemon + model class). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… blocker resolved The 'five model SHA256 hashes' pre-release blocker was cleared when the unhashed Qwen3 entries were dropped (#50); the shipping registry is 3 models, all hashed. README still claimed 5 GGUFs + an open blocker. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ose + bench.py The dry-run validator requires docker-compose.yml + bench.py for ACTIVE sandboxes (caught by Bench Framework CI on PR #67 — the framework doing its job). bench.py delegates to run.mjs (ONE harness, the exact artifact that produced the CONFIRMED result); compose runs it in node:22-slim against the HOST Ollama daemon via host-gateway, same host-dependency pattern as vllm-q4-llama8b. run.mjs now honors OLLAMA_URL. Local validate_compose: PASS. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…row 9 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Brand and others added 7 commits May 30, 2026 19:46

Commit uncommitted work before Dropbox->Git migration (2026-05-30)

e4613de

docs(release): v0.1.1 draft notes — first measured-evidence release

5c60cb5

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

docs(bench): regenerate coverage + metrics — amnesia-ab maps to spec …

7b691cc

…row 9 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

OpenCircuitDev merged commit b65cfcb into main Jun 11, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

amnesia-ab CONFIRMED + Ollama backend adapter + v0.1.1 prep#67

amnesia-ab CONFIRMED + Ollama backend adapter + v0.1.1 prep#67
OpenCircuitDev merged 7 commits into
mainfrom
feat/bench-coverage-dashboard-trends

OpenCircuitDev commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

OpenCircuitDev commented Jun 11, 2026

What's in here (5 commits on top of main)

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant